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Q.^ ' An algorithm is proposed to optimize quantum Monte Carlo (QMC) wave functions based on Newton's 

, method and analytical computation of the first and second derivatives of the variational energy. This direct 

• application of the variational principle yields significantly lower energy than variance minimization methods 
when applied to the same trial wave function. Quadratic convergence to the local minimum of the variational 

^ ■ parameters is achieved. A general theorem is presented, which substantially simplifies the analytic expressions 

Q ' of derivatives in the case of wave function optimization. To demonstrate the method, the ground state energies 

, of the first-row elements are calculated. 

oo 

O-t- I. INTRODUCTION 

B- 

^ ' Quantum Monte Carlo is a powerful method of solving the Schrodinger equation. QMC treatsjnany-body correlation in an 
(— I efficient and flexible way, enabling highly accurate studies of atoms, small molecules and clusters.Eru A high-quality trial wave 
O function is crucial to the calculation, since the trial function determines the ultimate accuracy one can achieve in variational 
^ 1 Monte Carlo (VMC) and fixed-node diffusion Monte Carlo, and trial function quality dramatically affects the efficiency of the 
O ■ computation. 

ty^ An algorithm which efficiently and reliably optimizes wave functions is a critical tool for VMC calculations. One straightfor- 

ward approach for improving the VMC wave function is to perform energy minimization, in which the variational parameters 
are altered with the goal of lowering the expectation value of the energy. This approach is complicated in VMC because of the 
I ^-f : uncertainties associated with stochastic sampling. In order to determine whether a new set of parameters yields a lower energy 
than the current set, one needs to sample a large number of configurations to ensure that the energy difference between the two 
CnI sets of parameters is actually larger than the energy error bars. Correlated sampling methods are frequently performed to improve 
>^ the efficiency of energy minimization. Typically, the energy is calculated using identical sampling points in configuration space 
for two trial wave functions which differ by a single parameter. The process is repeated for each parameter, and steepest-descent 
techniques are commonly used for parameter updating. □ This correlated sampling approach requires a significant amount of 
memory (to store data for every sampling point) and the numerical differentiation AE/Ac requires many extra evaluations of 
the local energy. For systems with a large number of parameters, numerical evaluation of the required derivatives becomes 
computationally intractable. Analytical energy derivative techniques are very seldom used in current VMC calculations. We will 
0^ concentrate on this in the following sections. 

' A successful alternative approach has been developed which focuses on lowering the variance of the local energy, i/^'/^I'.i 
O If the wave function 4' were the exact ground eigenstate, the local energy would be a constant with a variance of zero. A 
c/3 major strength of the variance minimization approach is that the quantity to be minimized has a minimum value which is known 
a priori (unlike energy minimization). This idea has been implemented in various ways and has recently become a nearly 
universal approach in VMC wave functioiOjOptimizations. Typically, one calculates first derivatives of the local energy variance 
analytically. Steepest-descent techniquesB Q or a combination of analytic first derivativeSjivith approximate expressions for the 
second derivatives are then used for wave function variance reduction (a least-squares fit)Jjii3 Although variance methods have 
the remarkable strength of an a priori minimum value of zero, it is much harder to compute the first and second derivatives 
of the variance analytically compared to variational energy methods. Therefore, approximate analytical derivatives beyond the 
first-order are used in real calculations, and to our knowledge the validity of these approximations has not been discussed within 
the scope of VMC wave function optimization. It is important to point out that the "direction sets" minimum-searching methods, 
such as steepest-descent and conjugate gradient are not efficient for wave function optimization in VMC, because these line- 
minimization techniques require at least one order of magnitude more evaluations of local energy along the search directions. 
Moreover, variance minimization is actually an indirect method, since a smaller variance does not necessarily correspond to a 
lower energy, and the main goal of variational methods such as VMC is the lowest possible upper bound to the energy. 

Correlated sampling can be used (instead of analytic,derivatives) to lower the variance of the local energy. One excellent ver- 
sion of this idea is known as the fixed-sample methodO In this approach, the sampling points for the objective function (variance 
of the local energy in this case) are fixed during the optimization procedure, which makes it possible to reduce stochastic noise 
during the optimization. In addition, it has been observed from a few preliminary-calculations that the number of configura- 
tions sufficient for parameter updating does not increase rapidly with system size.tll The use of very complex trial correlation 
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functions has yielded highly accurate energies for a few first-row atomsJlil t3 However, this fixed-sample procedure can have 
problems if the variational parameters affect the nodes, since the density ratio of the current and initial trial wave functions 
diverges frequently in the area, around the nodes of the trial wave function. Even worse, this density ratio increases exponentially 
with the size of the system.Ej Although manually setting an upper bound for the WjCights or introducing a nodeless sampling 
probability density function can overcome the singularities in the fixed distributiontj, a general and appropriate form for the 
positive definite functions is still unavailable. In addition, the variational-energy from fixed-sample calculations can be sensitive 
to the choice of reference energy, sample size, and convergence criteria.Ej 

The method we present involves updating the variational parameters to lower the energy expectation value, guided by the force 
vectors and Hessian matrix of the variational energy with respect to variational parameters. Generally it converges quadrat- 
ically, maldue it more efficient than the steepest-descent or quasi-Newton techniques employed in the variance minimization 
procedure.Q eI In most cases, the best set of parameters can be obtained after only one or two iterations. Beginning with an iden- 
tical trial wave function and the same variational parameters, the correlation energies obtained from our method are significantly 
better than results in the literature.El With this approach, we also demonstrate the ability to optimize a wave function with a large 
number of parameters. All of the data are collected and compared in Section IV. 



II. VMC AND OPTIMIZATION ALGORITHM 



Variational Monte Carlo allows us to take our physical insights and construct a trial wave function containing a set of 
variational parameters {cm}- The parameters are varied with the goal of reducing the energy expectation value. In VMC, the 
true ground state energy is given by the Raleigh-Ritz quotient: 

p.p.. /^^({c.J)g^T({c„})rfT 

Eo < -t/T ({Cm} - n ,f ^ . ,r TT-: 

J *T (|Cm}) *T ({Cm}) O-T 
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where £'l = H^t/^t: is called the local energy and a is a configuration-space point, visited with relative probability 'I'^^'t, 
the density of the trial wave function at a. 

In a bound molecular system with fixed nuclei, the non-relativistic Hamiltonian 



- . .J l^il . ^ . ■ tj 

has inversion symmetry. (Note that capital letter subscripts refer to nuclei and lower-case letters refer to electrons.) Therefore, 
the true ground-state wave function of this class of Hamiltonian can generally be constructed without an imaginary part, i.e., 

*T ({Cm}) = ^-T ({c™}) . 

In this case, the expectation value of the energy and the first derivative of energy with respect to a variational parameter can 
be written as 

E - 



T 



dE 1 / /■ 9* f> T , /■ T f> 5* J 

-H^dT+ / ^H- — dr 



dcm J ^'^rfr \J dc,n J da 

1 f ^ f 

^H^dT / 2^-- — dr. (1) 
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Because 



H-^dT = / -^H- — dr, 



dcm J dc 

for real wave functions, we simplify Eq. (1) and obtain 
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where we define 



We notice tliat tlie finite sum for different terms performed in the same configuration samplings in the formula above can make 
more efficient computation and reduce the fluctuations in the sense of correlated sampling. 

Similarly, one can compute the second derivatives of variational energy with respect to variational parameters as 



dcmdc 
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We perform a standard Metropolis walk with importance sampling for E and its first and second derivatives. This gives 
numerical values for the force vector b and Hessian matrix H, which are defined as 

dE 



dCr, 



and 



The parameters are then updated according to 



H = 



d^E 

dCmdCn 



Cnext — *^cur H ' b 

until converged. Here Ccur and Cnext stand for the current and next values of the trial parameter set respectively. 
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III. THEOREM OF LOCAL OBSERVABLE QUANTITY DERIVATIVE 



We now demonstrate that the expectation value of the first derivative of the local value Ol = C'\E'/\I' of any Hermitian 
operator O with respect to any real parameter c in any real wave function ^I* is always zero, i.e.. 




Explicitly, the left hand side of Eq. (3) is 



This theorem explains the simplicity of Eq. (2): the first-order change of expectation value with respect to a change of 
parameter comes only from the change of wave function and the Metropolis sampling weights, not from the change of the 
quantity (e.g. the local energy). 
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rv. APPLICATIONS AND DISCUSSION 

To test the performance of this new analytic energy minimization scheme, a well-known trial wave functionJl^EI is used in the 
calculations. Explicitly, the trial wave function is expressed as 

*T = D^D^F 
F = exp 

Ni 

^ 1^3 — Z_^^kl y u I ji -r I ji I u 

k 

1 + bmi 

_ _ djrjj 

'^'^ " 1 + din,' 

where and are the Hartree-Fock up-spin and down-spin Slater determinants in a converged STO basis setjl3 and F is a 
positive correlation wave function. The mki,nki and Oki are taken to be integers. All of the parameters Cki, bj and dj can be 
optimized to obtain the lowest energy. 

With our method, a configuration size consisting of 200,000 sampling points is normally enough for satisfactory optimization 
for the first row atoms. Typically, one or two iterations are sufficient for convergence, requiring about fifty CPU hours on a SGI 
90 MHz R8000 processor Electrons are moved one by one with a time step chosen to maintain an acceptance ratio of 80%. In 
order to generate one independent sample point, a block size of twenty sequential steps is used. 

To makCpfl comparison with the variance minimization method, we choose the same set of nine parameters as Schmidt and 
MoskowitzB with all zeroes as initial values. We also obey their constraints, enforcing the unlike-spin electron-electron cusp 
condition and setting 6/ and dj to unity. The optimized wave function and energy are shown in Tables I and II. The calculated 
resulti.5iqth our method are noticeably better for all first-row elements, especially for the so-called 2s — 2p near-degeneracy 
atomsBE3 Be, B and C. Approximately 10% more correlation energy is recovered by our analytic energy derivative method. 

To demonstrate the power of our analytic energy minimization approach more fully, we optimize a forty-two parameter wave 
function, starting from the nine-parameter trial function discussed above. We use all terms with m + n < 4 combined with 
o < 3, m = n = with o — A, and all terms with m + n > 4 and m < A, n < A with o — 0. The same cusp, 6/ and di 
constraints were obeyed. 
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TABLE L Optimized ground state wave function and variational energy (with error bar and correlation energy percentage) for atoms He to 

C. 



m 


n 


o 


He 


Li 


Be 


B 


C 








1 


0.2500000 


0.2500000 


0.2500000 


0.2500000 


0.2500000 








2 


-0.0094564 


0.0143877 


0.1977687 


0.0594379 


-0.1413218 








3 


0.1214671 


0.2761786 


-0.8396261 


-0.6320118 


-0.1285105 








4 


-0.1399809 


-0.5225103 


0.0634756 


0.0444298 


-0.2202719 


2 








0.2569693 


-0.0625743 


-0.3428204 


-0.2402583 


-0.1269579 


3 








-0.1316968 


0.1942677 


1.3266686 


1.0019282 


0.5326180 


4 








-0.8487197 


-0.5490759 


-2.1688741 


-1.8251190 


-1.2566210 


2 


2 





-1.2608994 


-0.5235010 


-1.1187348 


-1.0333565 


-0.8918771 


2 





2 


0.8683429 


0.6336047 


2.1862056 


1.9776332 


1.6388292 




Energy 


(Ha) 


-2.90322(3) 


-7.47498(5) 


-14.6413(2) 


-24.6206(3) 


-37.8054(3) 


Correlatio 


ft (%) 


99 


93 


72 


73 


75 


Energy(Rel 


\ (Ha) 


-2.9029(1) 


-7.4731(6) 


-14.6332(8) 


-24.6113(8) 


-37.7956(7) 




Correlation(RefH) (%) 


98 


89 


64 


66 


68 




Energy-42 (Ha) 


-2.903717(8) 


-7.47722(4) 


-14.6475(1) 


-24.6257(1) 


-37.8116(2) 




Correlation-42 (%) 


100 


98 


79 


77 


79 



TABLE II. Optimized ground state wave function and variational energy (with an error bar and correlation energy percentage) for atoms N 
to Ne. 



m 


n 




o 


N 





F 


Ne 










1 


0.2500000 


0.2500000 


0.2500000 


0.2500000 










2 


-0.2657443 


-0.3727767 


-0.4141830 


-0.4715589 










3 


0.1906864 


0.4670193 


0.5988020 


0.7230792 










4 


-0.4252186 


-0.6653063 


-0.7861718 


-0.8802268 


2 










-0.0314994 


0.0354552 


0.0879260 


0.0690328 


3 










0.2343842 


0.1581261 


-0.0123869 


0.0270636 


4 










-0.9314224 


-0.8723734 


-0.6392097 


-0.6689391 


2 


2 







-0.9111045 


-1.0736302 


-1.1368462 


-1.1774526 


2 







2 


1.5219105 


1.5985734 


1.5418886 


1.5606005 




Energy 




(Ha) 


-54.5477(3) 


-75.0168(1) 


-99.6792(2) 


-128.8832(1) 


Correlatio 




(%) 


78 


80 


84 


86 


Energy(Rel 


kr- 


(Ha) 


-54.5390(6) 


-75.0109(4) 


-99.6685(5) 


-128.8771(5) 




Correlation(RefH) 


(%) 


73 


78 


80 


85 




Energy-42 


(Ha) 


-54.5563(2) 


-75.0270(1) 


-99.6912(2) 


-128.8910(2) 




Correlation-42 


(%) 


82 


84 


88 


88 



It is also interesting to note that in a recent VMC calculation for atoms Be, B and C,t^ the use of additional Slater determinants 
enabled the authors to recover an amount of correlation energy similar to ours. Our current work demonstrates that this 2s — 2p 
near-degeneracy effect for the first-row atoms accounts for less than 25% of the correlation energy. 

In a typical optimization procedure with this energy derivative method, the energy value and its associated eiTor bar decrease 
with the first (and possibly second) parameter moves. After that, the forces are much smaller than their error bars, indicating a 
local minimum. Table III shows an example of the carbon atom. 

However, rather than taking all zeroes as initial guess for variation parameters, if we start from Schmidt and Moskowitz's 
optimized wave function, a smaller but still sharp decrease occurs at the first iteration. Taking the atom B, for example, after one 
iteration, we obtained about 7% more correlation energy. 

As one can see from Figs. 1-3, the energy derivatives are much smoother than the energy itself. As a result, it is much easier 
to find the parameter value which gives dE/dc — than to locate the minimum from energy data alone. As discussed in Section 
III, the general theorem of the local value derivatives permits reduction of noise associated with the energy derivatives for a 
much more efficient and reliable wave function optimization in VMC. 

After the optimization, the Hessian matrix is diagonalized to check the positivity of the eigenvalues. All of the eigenvalues are 
positive or small negative numbers. A positive definite Hessian guarantees all downhill movement to reach a real local minimum. 
The negative values are much smaller than their error bars, indicating search directions with tiny positive curvature. 
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TABLE III. An optimization procedure for atom C, withi initial parameters as zeroes. 



Iteration Energy Error bar 

-37.68745 0.00039 

1 -37.80080 0.00013 

2 -37.80945 0.00012 

3 -37.80901 0.0001 1 

4 -37.80918 0.0001 1 



TABLE IV. An optimization procedure for atom B, with optimized initial values from Re: 



Iteration Energy Error bar 

-24.61109 0.00027 

1 -24.62044 0.00028 

2 -24.62058 0.00029 

3 -24.62043 0.00028 

4 -24.62083 0.00028 



V. CONCLUSIONS 



We have explored a new method to optimize wave functions in VMC calculations. This method is a direct application of 
energy minimization. It is very efficient, giving quadratic convergence, and it is straightforwardly applicable to systems having 
a large number of parameters. In direct comparisons using identical trial wave functions, the current method yields significantly 
lower energy expectation values than are achieved with variance minimization for all first-row atoms. 
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FIG. 1. Energy minimization : energies and error bars for tiie Be atom, as parameter for m — 4, n — 0, o — 0, is varied. 




FIG. 2. Energy minimization : first-derivative of the energy with respect to the same parameter as Fig. I. 
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TABLE V. Optimized ground state wave function and variational energy (witli an error bar and correlation energy percentage) for atoms N 
to Ne. 



m 


n 




o 


N 


O 


F 


Ne 










1 


0.2500000 


0.2500000 


0.2500000 


0.2500000 










2 


-0.2657443 


f\ ^ ^^^^ ^ 

-0.3727767 


-0.4141830 


-0.4715589 










3 


0.19Uoo64 


0.4670193 


A CAOOATA 

0.5985020 


0.7230792 










A 

4 


A A OCT 1 O/C 


-O.oojJOoi 


-0. /ool /lo 


A OOAIO/iO 

-O.ooOZZOo 


2 










A Al 1 /I OA A 

-0.0314994 


A Al C /I C C'^ 

0.0354552 


A AO'7A'1iCA 

0.0879260 


A A/^AAIOO 

0.0690328 


3 










0.2343842 


0.1581261 


-0.0123869 


0.0270636 


4 










-0.9314224 


-0.8723734 


-0.6392097 


-0.6689391 


2 


2 







-0.9111045 


-1.0736302 


-1.1368462 


-1.1774526 


2 







2 


1.5219105 


1.5985734 


1.5418886 


1.5606005 




Energy 




(Ha) 


-54.5477(3) 


-75.0168(1) 


-99.6792(2) 


-128.8832(1) 


Correlatio 




(%) 


78 


80 


84 


86 


Energy(Rel 




(Ha) 


-54.5390(6) 


-75.0109(4) 


-99.6685(5) 


-128.8771(5) 




Correlation(RefH) 


(%) 


73 


78 


80 


85 




Energy-42 


(Ha) 


-54.5563(2) 


-75.0270(1) 


-99.6912(2) 


-128.8910(2) 




Correlation-42 


(%) 


82 


84 


88 


88 



TABLE VI. An optimization procedure for atom C, with initial parameters as zeroes. 



Iteration Energy Error bar 

-37.68745 0.00039 

1 -37.80080 0.00013 

2 -37.80945 0.00012 

3 -37.80901 0.00011 

4 -37.80918 0.00011 



TABLE VII. An optimization procedure for atom B, with optimized initial values from Rel 



Iteration Energy Error bar 

-24.61109 0.00027 

1 -24.62044 0.00028 

2 -24.62058 0.00029 

3 -24.62043 0.00028 

4 -24.62083 0.00028 
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